89 research outputs found

    SaberX4: High-throughput Software Implementationof Saber Key Encapsulation Mechanism

    Get PDF
    Saber is a module lattice-based CCA-secure key encapsulation mechanism (KEM) which has been shortlisted for the second round of NIST\u27s Post Quantum Cryptography Standardization project. To attain simplicity and efficiency on constrained devices, the Saber algorithm is serial by construction. However, on high-end platforms, such as modern Intel processors with AVX2 instructions, Saber achieves limited speedup using vector processing instructions due to its serial nature. In this paper we overcome the above-mentioned algorithmic bottleneck and propose a high-throughput software implementation of Saber, which we call `SaberX4\u27, targeting modern Intel processors with AVX2 vector processing support. We apply the batching technique at the highest level of the implementation hierarchy and process four Saber KEM operations simultaneously in parallel using the AVX2 vector processing instructions. Our proof-of-concept software implementation of SaberX4 achieves nearly 1.5 times higher throughput at the cost of latency degradation within acceptable margins, compared to the AVX2-optimized non-batched implementation of Saber by its authors. We anticipate that both latency and throughput of SaberX4 will improve in the future with improved computer architectures and more optimization efforts

    Constant-time BCH Error-Correcting Code

    Get PDF
    Error-correcting codes can be useful in reducing decryption failure rate of several lattice-based and code-based public-key encryption schemes. Two schemes, namely LAC and HQC, in NIST’s round 2 phase of its post-quantum cryptography standardisation project use the strong error-correcting BCH code. However, direct application of the BCH code in decryption algorithms of public-key schemes could open new avenues to the attacks. For example, a recent attack exploited non-constant-time execution of BCH code to reduce the security of LAC. In this paper we analyse the BCH error-correcting code, identify computation steps that cause timing variations and design the first constant-time BCH algorithm. We implement our algorithm in software and evaluate its resistance against timing attacks by performing leakage detection tests. To study the computational overhead of the countermeasures, we integrated our constant-time BCH code in the reference and optimised implementations of the LAC scheme as a case study, and observed nearly 1.1 and 1.4 factor slowdown respectively for the CCA-secure primitive

    Constant-time discrete Gaussian sampling

    Get PDF
    © 2018 IEEE. Sampling from a discrete Gaussian distribution is an indispensable part of lattice-based cryptography. Several recent works have shown that the timing leakage from a non-constant-time implementation of the discrete Gaussian sampling algorithm could be exploited to recover the secret. In this paper, we propose a constant-time implementation of the Knuth-Yao random walk algorithm for performing constant-time discrete Gaussian sampling. Since the random walk is dictated by a set of input random bits, we can express the generated sample as a function of the input random bits. Hence, our constant-time implementation expresses the unique mapping of the input random-bits to the output sample-bits as a Boolean expression of the random-bits. We use bit-slicing to generate multiple samples in batches and thus increase the throughput of our constant-time sampling manifold. Our experiments on an Intel i7-Broadwell processor show that our method can be as much as 2.4 times faster than the constant-time implementation of cumulative distribution table based sampling and consumes exponentially less memory than the Knuth-Yao algorithm with shuffling for a similar level of security

    Arithmetic of Ï„\tau-adic Expansions for Lightweight Koblitz Curve Cryptography

    Get PDF
    Koblitz curves allow very efficient elliptic curve cryptography. The reason is that one can trade expensive point doublings to cheap Frobenius endomorphisms by representing the scalar as a tau-adic expansion. Typically elliptic curve cryptosystems, such as ECDSA, also require the scalar as an integer. This results in a need for conversions between integers and the tau-adic domain, which are costly and hinder the use of Koblitz curves on very constrained devices, such as RFID tags, wireless sensors, or certain applications of the Internet of things. We provide solutions to this problem by showing how complete cryptographic processes, such as ECDSA signing, can be completed in the tau-adic domain with very few resources. This allows outsourcing conversions to a more powerful party. We provide several algorithms for performing arithmetic operations in the tau-adic domain. In particular, we introduce a new representation allowing more efficient and secure computations compared to the algorithms available in the preliminary version of this work from CARDIS 2014. We also provide datapath extensions with different speed and side-channel resistance properties that require areas from less than one hundred to a few hundred gate equivalents on 0.13-mu m CMOS. These extensions are applicable for all Koblitz curves.Peer reviewe

    Exploring RNS for Isogeny-based Cryptography

    Get PDF
    Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method by realizing a cryptoprocessor for isogeny-based SIDH key exchange. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 151,009 LUTs, 143,171 FFs and 1,056 DSPs. It achieves 250 MHz clock frequency and finishes the key exchange for SIDH in 3.8 and 4.9 ms

    PROTEUS: A Tool to generate pipelined Number Theoretic Transform Architectures for FHE and ZKP applications

    Get PDF
    Emerging cryptographic algorithms such as fully homomorphic encryption (FHE) and zero-knowledge proof (ZKP) perform arithmetic involving very large polynomials. One fundamental and time-consuming polynomial operation is the Number theoretic transform (NTT) which is a generalization of the fast Fourier transform. Hardware platforms such as FPGAs could be used to accelerate the NTTs in FHE and ZKP protocols. One major problem is that the FHE and ZKP protocols require different parameter sets, e.g., polynomial degree and coefficient size, depending on their applications. Therefore, a basic research question is: How to design scalable hardware architectures for accelerating NTTs in the FHE and ZKP protocols? In this paper, we present ‘PROTEUS’, an open-source and parametric tool that generates synthesizable bandwidth-efficient NTT architectures for user-specified parameter sets. The architectures can be tuned to utilize different memory bandwidths and parameters which is a very important design requirement in both FHE and ZKP protocols. The generated NTT architectures show a significant performance speedup compared to similar NTT architectures on FPGA. Further comparisons with state-of-the-art show a reduction of up to 23% and 35% in terms of DSP and BRAM utilization

    Teaching HW/SW codesign with a Zynq ARM/FPGA SoC

    Get PDF
    © 2017 IEEE. In this paper we describe a lab session-based hardware/software (HW/SW) codesign course for implementing embedded systems. The goals of the course are to teach the fundamental concepts of embedded system design, develop hands-on HW/SW codesign skills, and to show that there are many possible ways to explore the design space. The reason behind choosing HW/SW codesign approach is that it brings the best of the two worlds: the flexibility of SW and the power/energy/computation efficiency of HW. As an example project, students codesign the well-known RSA public-key cryptosystem in the Xilinx Zybo boards that contain a Xilinx 7-series FPGA coupled with an embedded ARM processing unit. Students are required to explore the design space, weigh the various alternatives and take design decisions. Besides, the project cultivates non-technical skills such as team building and management, sharing of work-load, decision making, presentation and technical report writing
    • …
    corecore